Ontology driven content extraction using interlingual annotation of texts in the OMNIA project

نویسندگان

  • Achille Falaise
  • David Rouquet
  • Didier Schwab
  • Hervé Blanchon
  • Christian Boitet
چکیده

OMNIA is an on-going project that aims to retrieve images accompanied with multilingual texts. In this paper, we propose a generic method (language and domain independent) to extract conceptual information from such texts and spontaneous user requests. First, texts are labelled with interlingual annotation, then a generic extractor taking a domain ontology as a parameter extract relevant conceptual information. Implementation is also presented with a first experiment and preliminary results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Annotation for Interlingual Representation of Multilingual Texts

This paper describes the annotation process being used in a multi-site project to create six sizable bilingual parallel corpora annotated with a consistent interlingua representation. After presenting the background and objectives of the effort, we describe the multilingual corpora and the three stages of interlingual representation being developed. We then focus on the annotation process itsel...

متن کامل

The PIA Project: Learning to Semantically Annotate Texts from an Ontology and XML-Instance Data

The development of the XML and RDF(S) standards offer a positive environment for machine learning to enable the automatic XML-annotation of texts that can encourage the extension of Semantic Web applications. After reviewing the current limitations of information extraction technology, specifically its lack of portability to new domains, we introduce the PIA project for automatically XML-annota...

متن کامل

Interlingua Development and Testing through Semantic Annotation of Multilingual Text Corpora

This paper describes a multi-site project to annotate the interlingual content of six sizable bilingual parallel corpora. The project addresses several principal problems in parallel: specification of interlingua content and notation, development of reliable annotation methods, and evaluation of annotated corpora. As a by-product, a growing corpus of annotated texts is being produced, which may...

متن کامل

Interlingual annotation of parallel text corpora: a new framework for annotation and evaluation

This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language...

متن کامل

Ontology-enablement of a system for semantic annotation of digital documents

We describe the recent enhancement of the CAFETIERE formalism (Conceptual Annotation of Facts, Events, Terms, Individual Entities and RElations) with the ability to link natural language words and phrases in textual documents with instances and classes from a language-enabled ontology. The language-enabled ontology is one with an index from one or more natural language expressions to each conce...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010